Overview

Dataset statistics

Number of variables46
Number of observations5783109
Missing cells41028593
Missing cells (%)15.4%
Duplicate rows7564
Duplicate rows (%)0.1%
Total size in memory2.0 GiB
Average record size in memory368.0 B

Variable types

Categorical40
Numeric6

Alerts

NU_ANO has constant value "2020" Constant
Dataset has 7564 (0.1%) duplicate rowsDuplicates
NO_MUNICIPIO_ESC has a high cardinality: 5268 distinct values High cardinality
NO_MUNICIPIO_PROVA has a high cardinality: 1712 distinct values High cardinality
Q018 is highly correlated with NU_ANOHigh correlation
Q006 is highly correlated with NU_ANOHigh correlation
TP_ST_CONCLUSAO is highly correlated with IN_TREINEIRO and 7 other fieldsHigh correlation
IN_TREINEIRO is highly correlated with TP_ST_CONCLUSAO and 6 other fieldsHigh correlation
Q001 is highly correlated with NU_ANOHigh correlation
Q024 is highly correlated with NU_ANOHigh correlation
Q022 is highly correlated with NU_ANOHigh correlation
TP_LOCALIZACAO_ESC is highly correlated with TP_ST_CONCLUSAO and 2 other fieldsHigh correlation
Q020 is highly correlated with NU_ANOHigh correlation
Q014 is highly correlated with NU_ANOHigh correlation
Q012 is highly correlated with NU_ANOHigh correlation
Q025 is highly correlated with NU_ANOHigh correlation
Q013 is highly correlated with NU_ANOHigh correlation
TP_ENSINO is highly correlated with TP_ST_CONCLUSAO and 2 other fieldsHigh correlation
Q017 is highly correlated with NU_ANOHigh correlation
Q016 is highly correlated with NU_ANOHigh correlation
Q002 is highly correlated with NU_ANOHigh correlation
TP_SEXO is highly correlated with NU_ANOHigh correlation
SG_UF_ESC is highly correlated with TP_ST_CONCLUSAO and 3 other fieldsHigh correlation
TP_ESCOLA is highly correlated with TP_ST_CONCLUSAO and 2 other fieldsHigh correlation
Q008 is highly correlated with NU_ANOHigh correlation
falta is highly correlated with NU_ANOHigh correlation
Q015 is highly correlated with NU_ANOHigh correlation
NU_ANO is highly correlated with Q018 and 36 other fieldsHigh correlation
TP_NACIONALIDADE is highly correlated with NU_ANOHigh correlation
Q019 is highly correlated with NU_ANOHigh correlation
Q023 is highly correlated with NU_ANOHigh correlation
TP_SIT_FUNC_ESC is highly correlated with TP_ST_CONCLUSAO and 2 other fieldsHigh correlation
Q011 is highly correlated with NU_ANOHigh correlation
Q021 is highly correlated with NU_ANOHigh correlation
Q009 is highly correlated with NU_ANOHigh correlation
TP_DEPENDENCIA_ADM_ESC is highly correlated with TP_ST_CONCLUSAO and 3 other fieldsHigh correlation
Q003 is highly correlated with NU_ANOHigh correlation
Q010 is highly correlated with NU_ANOHigh correlation
TP_ESTADO_CIVIL is highly correlated with NU_ANOHigh correlation
SG_UF_PROVA is highly correlated with SG_UF_ESC and 1 other fieldsHigh correlation
Q004 is highly correlated with NU_ANOHigh correlation
Q007 is highly correlated with NU_ANOHigh correlation
TP_ENSINO has 4479663 (77.5%) missing values Missing
CO_MUNICIPIO_ESC has 4878540 (84.4%) missing values Missing
NO_MUNICIPIO_ESC has 4878540 (84.4%) missing values Missing
CO_UF_ESC has 4878540 (84.4%) missing values Missing
SG_UF_ESC has 4878540 (84.4%) missing values Missing
TP_DEPENDENCIA_ADM_ESC has 4878540 (84.4%) missing values Missing
TP_LOCALIZACAO_ESC has 4878540 (84.4%) missing values Missing
TP_SIT_FUNC_ESC has 4878540 (84.4%) missing values Missing
Q001 has 95966 (1.7%) missing values Missing
Q002 has 95966 (1.7%) missing values Missing
Q003 has 95966 (1.7%) missing values Missing
Q004 has 95966 (1.7%) missing values Missing
Q005 has 95966 (1.7%) missing values Missing
Q006 has 95966 (1.7%) missing values Missing
Q007 has 95966 (1.7%) missing values Missing
Q008 has 95966 (1.7%) missing values Missing
Q009 has 95966 (1.7%) missing values Missing
Q010 has 95966 (1.7%) missing values Missing
Q011 has 95966 (1.7%) missing values Missing
Q012 has 95966 (1.7%) missing values Missing
Q013 has 95966 (1.7%) missing values Missing
Q014 has 95966 (1.7%) missing values Missing
Q015 has 95966 (1.7%) missing values Missing
Q016 has 95966 (1.7%) missing values Missing
Q017 has 95966 (1.7%) missing values Missing
Q018 has 95966 (1.7%) missing values Missing
Q019 has 95966 (1.7%) missing values Missing
Q020 has 95966 (1.7%) missing values Missing
Q021 has 95966 (1.7%) missing values Missing
Q022 has 95966 (1.7%) missing values Missing
Q023 has 95966 (1.7%) missing values Missing
Q024 has 95966 (1.7%) missing values Missing
Q025 has 95966 (1.7%) missing values Missing
TP_COR_RACA has 116883 (2.0%) zeros Zeros

Reproduction

Analysis started2022-04-14 13:34:18.345291
Analysis finished2022-04-14 14:13:19.198966
Duration39 minutes and 0.85 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
2020
5783109 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
20205783109
100.0%

Length

2022-04-14T11:13:19.341814image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:19.460335image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
20205783109
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_SEXO
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
F
3468805 
M
2314304 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowF
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
F3468805
60.0%
M2314304
40.0%

Length

2022-04-14T11:13:19.568237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:19.671529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
f3468805
60.0%
m2314304
40.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_ESTADO_CIVIL
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
1
4851310 
2
527734 
0
 
263848
3
 
131423
4
 
8794

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
14851310
83.9%
2527734
 
9.1%
0263848
 
4.6%
3131423
 
2.3%
48794
 
0.2%

Length

2022-04-14T11:13:19.771347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:19.888250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
14851310
83.9%
2527734
 
9.1%
0263848
 
4.6%
3131423
 
2.3%
48794
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_COR_RACA
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.146922356
Minimum0
Maximum5
Zeros116883
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:19.995955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q33
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.012237598
Coefficient of variation (CV)0.4714830955
Kurtosis-1.118815594
Mean2.146922356
Median Absolute Deviation (MAD)1
Skewness-0.1337619025
Sum12415886
Variance1.024624955
MonotonicityNot monotonic
2022-04-14T11:13:20.145987image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32720485
47.0%
12007633
34.7%
2771740
 
13.3%
4128522
 
2.2%
0116883
 
2.0%
537846
 
0.7%
ValueCountFrequency (%)
0116883
 
2.0%
12007633
34.7%
2771740
 
13.3%
32720485
47.0%
4128522
 
2.2%
537846
 
0.7%
ValueCountFrequency (%)
537846
 
0.7%
4128522
 
2.2%
32720485
47.0%
2771740
 
13.3%
12007633
34.7%
0116883
 
2.0%

TP_NACIONALIDADE
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
1
5624622 
2
 
139381
4
 
9411
3
 
8036
0
 
1659

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
15624622
97.3%
2139381
 
2.4%
49411
 
0.2%
38036
 
0.1%
01659
 
< 0.1%

Length

2022-04-14T11:13:20.322760image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:20.441088image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
15624622
97.3%
2139381
 
2.4%
49411
 
0.2%
38036
 
0.1%
01659
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_ST_CONCLUSAO
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
1
3794279 
2
1395827 
3
557425 
4
 
35578

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
13794279
65.6%
21395827
 
24.1%
3557425
 
9.6%
435578
 
0.6%

Length

2022-04-14T11:13:20.562175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:20.689185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
13794279
65.6%
21395827
 
24.1%
3557425
 
9.6%
435578
 
0.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_ESCOLA
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
1
4387282 
2
1194496 
3
 
201331

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
14387282
75.9%
21194496
 
20.7%
3201331
 
3.5%

Length

2022-04-14T11:13:20.805192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:20.931735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
14387282
75.9%
21194496
 
20.7%
3201331
 
3.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_ENSINO
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing4479663
Missing (%)77.5%
Memory size44.1 MiB
1.0
1294245 
2.0
 
9201

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.01294245
 
22.4%
2.09201
 
0.2%
(Missing)4479663
77.5%

Length

2022-04-14T11:13:21.041104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:21.156918image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1.01294245
99.3%
2.09201
 
0.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

IN_TREINEIRO
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
0
5225684 
1
557425 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
05225684
90.4%
1557425
 
9.6%

Length

2022-04-14T11:13:21.259912image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:21.361571image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
05225684
90.4%
1557425
 
9.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CO_MUNICIPIO_ESC
Real number (ℝ≥0)

MISSING

Distinct5534
Distinct (%)0.6%
Missing4878540
Missing (%)84.4%
Infinite0
Infinite (%)0.0%
Mean3152161.298
Minimum1100015
Maximum5300108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:21.508999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1100015
5-th percentile1400100
Q12312007
median3163706
Q33550308
95-th percentile5208707
Maximum5300108
Range4200093
Interquartile range (IQR)1238301

Descriptive statistics

Standard deviation1018560.811
Coefficient of variation (CV)0.3231309296
Kurtosis-0.2845590297
Mean3152161.298
Median Absolute Deviation (MAD)657702
Skewness0.245296504
Sum2.851347393 × 1012
Variance1.037466125 × 1012
MonotonicityNot monotonic
2022-04-14T11:13:21.729915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
355030838277
 
0.7%
230440027783
 
0.5%
330455726431
 
0.5%
530010819840
 
0.3%
130260317120
 
0.3%
310620011826
 
0.2%
261160610735
 
0.2%
292740810500
 
0.2%
41069029868
 
0.2%
15014029057
 
0.2%
Other values (5524)723132
 
12.5%
(Missing)4878540
84.4%
ValueCountFrequency (%)
1100015126
 
< 0.1%
1100023580
< 0.1%
110003129
 
< 0.1%
1100049811
< 0.1%
110005688
 
< 0.1%
1100064202
 
< 0.1%
110007241
 
< 0.1%
110008083
 
< 0.1%
1100098229
 
< 0.1%
1100106241
 
< 0.1%
ValueCountFrequency (%)
530010819840
0.3%
522230240
 
< 0.1%
522220353
 
< 0.1%
522205475
 
< 0.1%
522200567
 
< 0.1%
522190824
 
< 0.1%
5221858708
 
< 0.1%
522180982
 
< 0.1%
5221700133
 
< 0.1%
5221601304
 
< 0.1%

NO_MUNICIPIO_ESC
Categorical

HIGH CARDINALITY
MISSING

Distinct5268
Distinct (%)0.6%
Missing4878540
Missing (%)84.4%
Memory size44.1 MiB
São Paulo
 
38277
Fortaleza
 
27783
Rio de Janeiro
 
26431
Brasília
 
19840
Manaus
 
17120
Other values (5263)
775118 

Length

Max length32
Median length9
Mean length10.06354297
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)< 0.1%

Sample

1st rowSalvador
2nd rowSantana de Parnaíba
3rd rowFeira de Santana
4th rowItapiranga
5th rowOeiras

Common Values

ValueCountFrequency (%)
São Paulo38277
 
0.7%
Fortaleza27783
 
0.5%
Rio de Janeiro26431
 
0.5%
Brasília19840
 
0.3%
Manaus17120
 
0.3%
Belo Horizonte11826
 
0.2%
Recife10735
 
0.2%
Salvador10500
 
0.2%
Curitiba9868
 
0.2%
Belém9137
 
0.2%
Other values (5258)723052
 
12.5%
(Missing)4878540
84.4%

Length

2022-04-14T11:13:21.980283image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
são85493
 
5.9%
de58111
 
4.0%
do50946
 
3.5%
rio40092
 
2.8%
paulo39583
 
2.7%
fortaleza27924
 
1.9%
janeiro26431
 
1.8%
brasília19932
 
1.4%
manaus17120
 
1.2%
grande13332
 
0.9%
Other values (3942)1066933
73.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CO_UF_ESC
Real number (ℝ≥0)

MISSING

Distinct27
Distinct (%)< 0.1%
Missing4878540
Missing (%)84.4%
Infinite0
Infinite (%)0.0%
Mean31.37474864
Minimum11
Maximum53
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:22.205168image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile14
Q123
median31
Q335
95-th percentile52
Maximum53
Range42
Interquartile range (IQR)12

Descriptive statistics

Standard deviation10.1578749
Coefficient of variation (CV)0.3237595626
Kurtosis-0.2567243777
Mean31.37474864
Median Absolute Deviation (MAD)6
Skewness0.270976306
Sum28380625
Variance103.1824224
MonotonicityNot monotonic
2022-04-14T11:13:22.560091image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
35167183
 
2.9%
23104981
 
1.8%
3171614
 
1.2%
3362583
 
1.1%
2647583
 
0.8%
2947235
 
0.8%
4143060
 
0.7%
5239769
 
0.7%
4336478
 
0.6%
1533109
 
0.6%
Other values (17)250974
 
4.3%
(Missing)4878540
84.4%
ValueCountFrequency (%)
119927
 
0.2%
123512
 
0.1%
1330120
 
0.5%
142453
 
< 0.1%
1533109
 
0.6%
163565
 
0.1%
178457
 
0.1%
2126104
 
0.5%
2215738
 
0.3%
23104981
1.8%
ValueCountFrequency (%)
5319840
 
0.3%
5239769
 
0.7%
5114674
 
0.3%
5011851
 
0.2%
4336478
 
0.6%
4227373
 
0.5%
4143060
 
0.7%
35167183
2.9%
3362583
 
1.1%
3218419
 
0.3%

SG_UF_ESC
Categorical

HIGH CORRELATION
MISSING

Distinct27
Distinct (%)< 0.1%
Missing4878540
Missing (%)84.4%
Memory size44.1 MiB
SP
167183 
CE
104981 
MG
71614 
RJ
62583 
PE
47583 
Other values (22)
450625 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBA
2nd rowSP
3rd rowBA
4th rowAM
5th rowPI

Common Values

ValueCountFrequency (%)
SP167183
 
2.9%
CE104981
 
1.8%
MG71614
 
1.2%
RJ62583
 
1.1%
PE47583
 
0.8%
BA47235
 
0.8%
PR43060
 
0.7%
GO39769
 
0.7%
RS36478
 
0.6%
PA33109
 
0.6%
Other values (17)250974
 
4.3%
(Missing)4878540
84.4%

Length

2022-04-14T11:13:22.750241image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sp167183
18.5%
ce104981
11.6%
mg71614
 
7.9%
rj62583
 
6.9%
pe47583
 
5.3%
ba47235
 
5.2%
pr43060
 
4.8%
go39769
 
4.4%
rs36478
 
4.0%
pa33109
 
3.7%
Other values (17)250974
27.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_DEPENDENCIA_ADM_ESC
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing4878540
Missing (%)84.4%
Memory size44.1 MiB
2.0
630939 
4.0
218705 
1.0
 
46594
3.0
 
8331

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row3.0
3rd row2.0
4th row2.0
5th row1.0

Common Values

ValueCountFrequency (%)
2.0630939
 
10.9%
4.0218705
 
3.8%
1.046594
 
0.8%
3.08331
 
0.1%
(Missing)4878540
84.4%

Length

2022-04-14T11:13:22.902427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:23.004915image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2.0630939
69.8%
4.0218705
 
24.2%
1.046594
 
5.2%
3.08331
 
0.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_LOCALIZACAO_ESC
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing4878540
Missing (%)84.4%
Memory size44.1 MiB
1.0
873348 
2.0
 
31221

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0873348
 
15.1%
2.031221
 
0.5%
(Missing)4878540
84.4%

Length

2022-04-14T11:13:23.118948image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:23.231754image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0873348
96.5%
2.031221
 
3.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

TP_SIT_FUNC_ESC
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing4878540
Missing (%)84.4%
Memory size44.1 MiB
1.0
899455 
4.0
 
2867
2.0
 
1878
3.0
 
369

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0899455
 
15.6%
4.02867
 
< 0.1%
2.01878
 
< 0.1%
3.0369
 
< 0.1%
(Missing)4878540
84.4%

Length

2022-04-14T11:13:23.347885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:23.453803image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0899455
99.4%
4.02867
 
0.3%
2.01878
 
0.2%
3.0369
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CO_MUNICIPIO_PROVA
Real number (ℝ≥0)

Distinct1747
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3086786.96
Minimum1100015
Maximum5300108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:23.602649image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1100015
5-th percentile1400100
Q12402204
median3121605
Q33550308
95-th percentile5204508
Maximum5300108
Range4200093
Interquartile range (IQR)1148104

Descriptive statistics

Standard deviation1020662.769
Coefficient of variation (CV)0.3306553973
Kurtosis-0.2275394886
Mean3086786.96
Median Absolute Deviation (MAD)518398
Skewness0.2682873398
Sum1.785122545 × 1013
Variance1.041752487 × 1012
MonotonicityNot monotonic
2022-04-14T11:13:23.838718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3550308250475
 
4.3%
3304557156727
 
2.7%
5300108116932
 
2.0%
1302603114037
 
2.0%
2304400108508
 
1.9%
2927408100075
 
1.7%
310620098720
 
1.7%
150140294499
 
1.6%
211130082212
 
1.4%
261160672652
 
1.3%
Other values (1737)4588272
79.3%
ValueCountFrequency (%)
1100015619
 
< 0.1%
11000234164
0.1%
11000494893
0.1%
1100056637
 
< 0.1%
1100064668
 
< 0.1%
1100098969
 
< 0.1%
11001063054
0.1%
11001142309
 
< 0.1%
11001225881
0.1%
1100130869
 
< 0.1%
ValueCountFrequency (%)
5300108116932
2.0%
52218586756
 
0.1%
52216012371
 
< 0.1%
52214036037
 
0.1%
52206031157
 
< 0.1%
52204544236
 
0.1%
52202071111
 
< 0.1%
52201082305
 
< 0.1%
52197531691
 
< 0.1%
52193081239
 
< 0.1%

NO_MUNICIPIO_PROVA
Categorical

HIGH CARDINALITY

Distinct1712
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
São Paulo
 
250475
Rio de Janeiro
 
156727
Brasília
 
116932
Manaus
 
114037
Fortaleza
 
108508
Other values (1707)
5036430 

Length

Max length30
Median length9
Mean length9.954416733
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBelém
2nd rowNatal
3rd rowSalvador
4th rowSantana de Parnaíba
5th rowDiamantina

Common Values

ValueCountFrequency (%)
São Paulo250475
 
4.3%
Rio de Janeiro156727
 
2.7%
Brasília116932
 
2.0%
Manaus114037
 
2.0%
Fortaleza108508
 
1.9%
Salvador100075
 
1.7%
Belo Horizonte98720
 
1.7%
Belém94499
 
1.6%
São Luís82212
 
1.4%
Recife72652
 
1.3%
Other values (1702)4588272
79.3%

Length

2022-04-14T11:13:24.058311image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
são585797
 
6.3%
de375394
 
4.1%
do282275
 
3.1%
rio256893
 
2.8%
paulo256030
 
2.8%
janeiro156727
 
1.7%
brasília118611
 
1.3%
manaus114037
 
1.2%
fortaleza108508
 
1.2%
belo103824
 
1.1%
Other values (1620)6870358
74.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CO_UF_PROVA
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.72044743
Minimum11
Maximum53
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:24.246063image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile14
Q124
median31
Q335
95-th percentile52
Maximum53
Range42
Interquartile range (IQR)11

Descriptive statistics

Standard deviation10.17673193
Coefficient of variation (CV)0.3312690009
Kurtosis-0.1963210569
Mean30.72044743
Median Absolute Deviation (MAD)5
Skewness0.2946672371
Sum177659696
Variance103.5658727
MonotonicityNot monotonic
2022-04-14T11:13:24.406288image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
35910492
15.7%
31577211
 
10.0%
29447691
 
7.7%
33387480
 
6.7%
15330883
 
5.7%
23325680
 
5.6%
26315569
 
5.5%
43249130
 
4.3%
41239635
 
4.1%
21238272
 
4.1%
Other values (17)1761066
30.5%
ValueCountFrequency (%)
1169594
 
1.2%
1241824
 
0.7%
13163426
2.8%
1416885
 
0.3%
15330883
5.7%
1647263
 
0.8%
1759209
 
1.0%
21238272
4.1%
22134678
2.3%
23325680
5.6%
ValueCountFrequency (%)
53116932
 
2.0%
52211069
 
3.6%
51101727
 
1.8%
5084548
 
1.5%
43249130
 
4.3%
42121153
 
2.1%
41239635
 
4.1%
35910492
15.7%
33387480
6.7%
32105812
 
1.8%

SG_UF_PROVA
Categorical

HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
SP
910492 
MG
577211 
BA
447691 
RJ
387480 
PA
330883 
Other values (22)
3129352 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPA
2nd rowRN
3rd rowBA
4th rowSP
5th rowMG

Common Values

ValueCountFrequency (%)
SP910492
15.7%
MG577211
 
10.0%
BA447691
 
7.7%
RJ387480
 
6.7%
PA330883
 
5.7%
CE325680
 
5.6%
PE315569
 
5.5%
RS249130
 
4.3%
PR239635
 
4.1%
MA238272
 
4.1%
Other values (17)1761066
30.5%

Length

2022-04-14T11:13:24.585801image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sp910492
15.7%
mg577211
 
10.0%
ba447691
 
7.7%
rj387480
 
6.7%
pa330883
 
5.7%
ce325680
 
5.6%
pe315569
 
5.5%
rs249130
 
4.3%
pr239635
 
4.1%
ma238272
 
4.1%
Other values (17)1761066
30.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q001
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
E
1444832 
B
1324198 
C
839009 
D
615284 
H
505033 
Other values (3)
958787 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowC
4th rowB
5th rowA

Common Values

ValueCountFrequency (%)
E1444832
25.0%
B1324198
22.9%
C839009
14.5%
D615284
10.6%
H505033
 
8.7%
F382851
 
6.6%
A336216
 
5.8%
G239720
 
4.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:24.749658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:24.860273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
e1444832
25.4%
b1324198
23.3%
c839009
14.8%
d615284
10.8%
h505033
 
8.9%
f382851
 
6.7%
a336216
 
5.9%
g239720
 
4.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q002
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
E
1796085 
B
1044422 
C
788103 
D
697557 
F
532008 
Other values (3)
828968 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowG
3rd rowB
4th rowC
5th rowA

Common Values

ValueCountFrequency (%)
E1796085
31.1%
B1044422
18.1%
C788103
13.6%
D697557
 
12.1%
F532008
 
9.2%
G434437
 
7.5%
A239085
 
4.1%
H155446
 
2.7%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:25.002641image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:25.122375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
e1796085
31.6%
b1044422
18.4%
c788103
13.9%
d697557
 
12.3%
f532008
 
9.4%
g434437
 
7.6%
a239085
 
4.2%
h155446
 
2.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q003
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
C
1350641 
B
1279458 
A
1218913 
D
893474 
F
677287 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowC
4th rowC
5th rowB

Common Values

ValueCountFrequency (%)
C1350641
23.4%
B1279458
22.1%
A1218913
21.1%
D893474
15.4%
F677287
11.7%
E267370
 
4.6%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:25.268899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:25.375944image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
c1350641
23.7%
b1279458
22.5%
a1218913
21.4%
d893474
15.7%
f677287
11.9%
e267370
 
4.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q004
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
2582143 
D
1082105 
A
993606 
F
467901 
C
374634 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowD
2nd rowB
3rd rowB
4th rowB
5th rowA

Common Values

ValueCountFrequency (%)
B2582143
44.6%
D1082105
18.7%
A993606
 
17.2%
F467901
 
8.1%
C374634
 
6.5%
E186754
 
3.2%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:25.509567image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:25.638592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b2582143
45.4%
d1082105
19.0%
a993606
 
17.5%
f467901
 
8.2%
c374634
 
6.6%
e186754
 
3.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q005
Real number (ℝ≥0)

MISSING

Distinct20
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean3.711457053
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.1 MiB
2022-04-14T11:13:25.935127image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q34
95-th percentile6
Maximum20
Range19
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.474033247
Coefficient of variation (CV)0.3971575654
Kurtosis4.353382356
Mean3.711457053
Median Absolute Deviation (MAD)1
Skewness1.049949945
Sum21107587
Variance2.172774014
MonotonicityNot monotonic
2022-04-14T11:13:26.094469image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
41735926
30.0%
31496056
25.9%
2847866
14.7%
5840828
14.5%
6307864
 
5.3%
1244798
 
4.2%
7118914
 
2.1%
851959
 
0.9%
920392
 
0.4%
1012194
 
0.2%
Other values (10)10346
 
0.2%
(Missing)95966
 
1.7%
ValueCountFrequency (%)
1244798
 
4.2%
2847866
14.7%
31496056
25.9%
41735926
30.0%
5840828
14.5%
6307864
 
5.3%
7118914
 
2.1%
851959
 
0.9%
920392
 
0.4%
1012194
 
0.2%
ValueCountFrequency (%)
20464
 
< 0.1%
1964
 
< 0.1%
1898
 
< 0.1%
17117
 
< 0.1%
16225
 
< 0.1%
15575
 
< 0.1%
14663
 
< 0.1%
131162
 
< 0.1%
122703
< 0.1%
114275
0.1%

Q006
Categorical

HIGH CORRELATION
MISSING

Distinct17
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
1884219 
C
1039889 
D
723147 
A
459211 
F
355481 
Other values (12)
1225196 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowC
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B1884219
32.6%
C1039889
18.0%
D723147
 
12.5%
A459211
 
7.9%
F355481
 
6.1%
E339346
 
5.9%
G252580
 
4.4%
H171316
 
3.0%
I98901
 
1.7%
J58292
 
1.0%
Other values (7)304761
 
5.3%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:26.272473image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b1884219
33.1%
c1039889
18.3%
d723147
 
12.7%
a459211
 
8.1%
f355481
 
6.3%
e339346
 
6.0%
g252580
 
4.4%
h171316
 
3.0%
i98901
 
1.7%
j58292
 
1.0%
Other values (7)304761
 
5.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q007
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
5323495 
B
 
199562
D
 
119094
C
 
44992

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A5323495
92.1%
B199562
 
3.5%
D119094
 
2.1%
C44992
 
0.8%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:26.418724image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:26.520951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a5323495
93.6%
b199562
 
3.5%
d119094
 
2.1%
c44992
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q008
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
4171310 
C
1066754 
D
 
266477
E
 
137550
A
 
45052

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B4171310
72.1%
C1066754
 
18.4%
D266477
 
4.6%
E137550
 
2.4%
A45052
 
0.8%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:26.638157image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:26.743933image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b4171310
73.3%
c1066754
 
18.8%
d266477
 
4.7%
e137550
 
2.4%
a45052
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q009
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
C
2919197 
D
1578254 
B
905615 
E
 
233596
A
 
50481

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowD
3rd rowE
4th rowC
5th rowB

Common Values

ValueCountFrequency (%)
C2919197
50.5%
D1578254
27.3%
B905615
 
15.7%
E233596
 
4.0%
A50481
 
0.9%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:26.879735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:27.005704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
c2919197
51.3%
d1578254
27.8%
b905615
 
15.9%
e233596
 
4.1%
a50481
 
0.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q010
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
3409682 
B
1860772 
C
360100 
D
 
46698
E
 
9891

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A3409682
59.0%
B1860772
32.2%
C360100
 
6.2%
D46698
 
0.8%
E9891
 
0.2%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:27.123430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:27.240008image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a3409682
60.0%
b1860772
32.7%
c360100
 
6.3%
d46698
 
0.8%
e9891
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q011
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
4439673 
B
1135229 
C
 
101890
D
 
9002
E
 
1349

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A4439673
76.8%
B1135229
 
19.6%
C101890
 
1.8%
D9002
 
0.2%
E1349
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:27.354692image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:27.459677image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a4439673
78.1%
b1135229
 
20.0%
c101890
 
1.8%
d9002
 
0.2%
e1349
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q012
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
5391398 
C
 
187345
A
 
93629
D
 
12312
E
 
2459

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B5391398
93.2%
C187345
 
3.2%
A93629
 
1.6%
D12312
 
0.2%
E2459
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:27.575024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:27.688482image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b5391398
94.8%
c187345
 
3.3%
a93629
 
1.6%
d12312
 
0.2%
e2459
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q013
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
3607444 
B
1953411 
C
 
110454
D
 
12980
E
 
2854

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A3607444
62.4%
B1953411
33.8%
C110454
 
1.9%
D12980
 
0.2%
E2854
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:27.806257image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:27.927021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a3607444
63.4%
b1953411
34.3%
c110454
 
1.9%
d12980
 
0.2%
e2854
 
0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q014
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
3185213 
A
2460301 
C
 
39904
D
 
1428
E
 
297

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowA
4th rowB
5th rowA

Common Values

ValueCountFrequency (%)
B3185213
55.1%
A2460301
42.5%
C39904
 
0.7%
D1428
 
< 0.1%
E297
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:28.043689image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:28.161450image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b3185213
56.0%
a2460301
43.3%
c39904
 
0.7%
d1428
 
< 0.1%
e297
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q015
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
5100067 
B
580995 
C
 
5453
D
 
414
E
 
214

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A5100067
88.2%
B580995
 
10.0%
C5453
 
0.1%
D414
 
< 0.1%
E214
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:28.278222image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:28.383078image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a5100067
89.7%
b580995
 
10.2%
c5453
 
0.1%
d414
 
< 0.1%
e214
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q016
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
3092401 
B
2571342 
C
 
22096
D
 
992
E
 
312

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowA
3rd rowB
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A3092401
53.5%
B2571342
44.5%
C22096
 
0.4%
D992
 
< 0.1%
E312
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:28.498375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:28.600211image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a3092401
54.4%
b2571342
45.2%
c22096
 
0.4%
d992
 
< 0.1%
e312
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q017
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
5565114 
B
 
119947
C
 
1658
D
 
232
E
 
192

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A5565114
96.2%
B119947
 
2.1%
C1658
 
< 0.1%
D232
 
< 0.1%
E192
 
< 0.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:28.724416image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:28.837511image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a5565114
97.9%
b119947
 
2.1%
c1658
 
< 0.1%
d232
 
< 0.1%
e192
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q018
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
4684103 
B
1003040 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A4684103
81.0%
B1003040
 
17.3%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:29.121775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:29.222014image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a4684103
82.4%
b1003040
 
17.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q019
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
4046928 
C
923833 
A
 
314552
D
 
277625
E
 
124205

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B4046928
70.0%
C923833
 
16.0%
A314552
 
5.4%
D277625
 
4.8%
E124205
 
2.1%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:29.318609image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:29.421870image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b4046928
71.2%
c923833
 
16.2%
a314552
 
5.5%
d277625
 
4.9%
e124205
 
2.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q020
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
4430283 
B
1256860 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowB
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A4430283
76.6%
B1256860
 
21.7%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:29.542275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:29.653945image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a4430283
77.9%
b1256860
 
22.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q021
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
4645079 
B
1042064 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A4645079
80.3%
B1042064
 
18.0%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:29.747395image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:29.852340image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a4645079
81.7%
b1042064
 
18.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q022
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
C
1814775 
D
1479427 
B
1329369 
E
949024 
A
 
114548

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowD
4th rowC
5th rowA

Common Values

ValueCountFrequency (%)
C1814775
31.4%
D1479427
25.6%
B1329369
23.0%
E949024
16.4%
A114548
 
2.0%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:29.952707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:30.068340image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
c1814775
31.9%
d1479427
26.0%
b1329369
23.4%
e949024
16.7%
a114548
 
2.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q023
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
4710317 
B
976826 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A4710317
81.4%
B976826
 
16.9%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:30.187192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:30.284823image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a4710317
82.8%
b976826
 
17.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q024
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
A
2844982 
B
2271989 
C
389736 
D
 
123930
E
 
56506

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowB
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A2844982
49.2%
B2271989
39.3%
C389736
 
6.7%
D123930
 
2.1%
E56506
 
1.0%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:30.384326image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:30.496657image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a2844982
50.0%
b2271989
39.9%
c389736
 
6.9%
d123930
 
2.2%
e56506
 
1.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Q025
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing95966
Missing (%)1.7%
Memory size44.1 MiB
B
4634811 
A
1052332 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowB
4th rowA
5th rowB

Common Values

ValueCountFrequency (%)
B4634811
80.1%
A1052332
 
18.2%
(Missing)95966
 
1.7%

Length

2022-04-14T11:13:30.618427image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:30.723849image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
b4634811
81.5%
a1052332
 
18.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

falta
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size44.1 MiB
1
3192751 
0
2590358 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
13192751
55.2%
02590358
44.8%

Length

2022-04-14T11:13:30.821515image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-14T11:13:30.920529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
13192751
55.2%
02590358
44.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2022-04-14T11:02:18.719571image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:26.564609image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:45.001656image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:50.148561image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:56.625595image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:07.431664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:19.506522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:29.692661image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:45.730795image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:50.754078image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:57.246934image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:08.304332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:20.131645image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:30.669916image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:46.926873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:51.314616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:57.866530image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:09.005618image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:22.325163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:34.823275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:47.827452image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:51.908763image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:59.974470image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:11.176279image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:24.425765image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:40.670170image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:48.606333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:52.512952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:02.138108image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:13.471982image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:26.455842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:44.131277image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:49.228600image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:01:53.106830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:04.346678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-04-14T11:02:15.707966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-04-14T11:13:31.089596image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-04-14T11:02:36.440163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-14T11:05:52.144040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-04-14T11:11:44.685647image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-04-14T11:12:15.656334image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

NU_ANOTP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ESCOLATP_ENSINOIN_TREINEIROCO_MUNICIPIO_ESCNO_MUNICIPIO_ESCCO_UF_ESCSG_UF_ESCTP_DEPENDENCIA_ADM_ESCTP_LOCALIZACAO_ESCTP_SIT_FUNC_ESCCO_MUNICIPIO_PROVANO_MUNICIPIO_PROVACO_UF_PROVASG_UF_PROVAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025falta
02020F12111NaN0NaNNaNNaNNaNNaNNaNNaN1501402Belém15PANaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1
12020M23111NaN0NaNNaNNaNNaNNaNNaNNaN2408102Natal24RNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
22020F232221.002927408.0Salvador29.0BA2.01.01.02927408Salvador29BABCAD3.0BABAAABABABAABAAAAAA1
32020M131221.003547304.0Santana de Parnaíba35.0SP3.01.01.03547304Santana de Parnaíba35SPNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
42020F13211NaN0NaNNaNNaNNaNNaNNaNNaN3121605Diamantina31MGBGBB3.0BABDAABABAAAABABBAAB1
52020F13111NaN0NaNNaNNaNNaNNaNNaNNaN4305207Cerro Largo43RSCBCB5.0CABEAABAAABAABBADABB0
62020M13111NaN0NaNNaNNaNNaNNaNNaNNaN2611606Recife26PENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1
72020M11111NaN0NaNNaNNaNNaNNaNNaNNaN3550308São Paulo35SPNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
82020F23111NaN0NaNNaNNaNNaNNaNNaNNaN2507507João Pessoa25PBBCCB2.0BABCAABABAAAABAACAAA0
92020M121221.00NaNNaNNaNNaNNaNNaNNaN2304400Fortaleza23CEAABA4.0BABBAABAAAAAABAAAAAB1

Last rows

NU_ANOTP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ESCOLATP_ENSINOIN_TREINEIROCO_MUNICIPIO_ESCNO_MUNICIPIO_ESCCO_UF_ESCSG_UF_ESCTP_DEPENDENCIA_ADM_ESCTP_LOCALIZACAO_ESCTP_SIT_FUNC_ESCCO_MUNICIPIO_PROVANO_MUNICIPIO_PROVACO_UF_PROVASG_UF_PROVAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025falta
57830992020M21111NaN0NaNNaNNaNNaNNaNNaNNaN5102504Cáceres51MTCCDC3.0EABCBBBABABAACBACBAB1
57831002020M00111NaN0NaNNaNNaNNaNNaNNaNNaN3122306Divinópolis31MGEEDD3.0FACCAABAAABAABBACAAA1
57831012020F131221.001500800.0Ananindeua15.0PA2.01.01.01500800Ananindeua15PADDCB4.0CABCABBAAABAABAADAAA0
57831022020F33111NaN0NaNNaNNaNNaNNaNNaNNaN2603454Camaragibe26PECCCB4.0CABCAABABABAABAACBBB1
57831032020M111221.003530300.0Mirassol35.0SP2.01.01.03530300Mirassol35SPNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
57831042020F111221.002304103.0Crateús23.0CE2.01.01.02304103Crateús23CEEGAD5.0GACCBBBAAABAABAADABB0
57831052020M111221.005008305.0Três Lagoas50.0MS1.01.01.05008305Três Lagoas50MSNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
57831062020F11111NaN0NaNNaNNaNNaNNaNNaNNaN3538808Piraju35SPNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0
57831072020F111231.005103403.0Cuiabá51.0MT4.01.01.05103403Cuiabá51MTGGEE5.0NADDCABCBBBBBDBBEADB1
57831082020F131221.002111300.0São Luís21.0MA2.01.01.02111300São Luís21MAFDDB5.0CABCBABBAAAAABBADAAB0

Duplicate rows

Most frequently occurring

NU_ANOTP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ESCOLATP_ENSINOIN_TREINEIROCO_MUNICIPIO_ESCNO_MUNICIPIO_ESCCO_UF_ESCSG_UF_ESCTP_DEPENDENCIA_ADM_ESCTP_LOCALIZACAO_ESCTP_SIT_FUNC_ESCCO_MUNICIPIO_PROVANO_MUNICIPIO_PROVACO_UF_PROVASG_UF_PROVAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025falta# duplicates
19292020F131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEBBBB4.0BABBAABAAAAAABAABAAA072
57002020M131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEBBBB4.0BABBAABAAAAAABAABAAA171
23812020F131221.002304954.0Guaiúba23.0CE2.01.01.02309706Pacatuba23CEBBAA4.0BABBAABAAAAAABAABAAB169
30632020F131221.002311306.0Quixadá23.0CE2.01.01.02311306Quixadá23CEHHFF4.0BABCAABAAAAAABAACAAA160
19302020F131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEBBBB4.0BABBAABAAAAAABAABAAA156
68742020M131221.002311306.0Quixadá23.0CE2.01.01.02311306Quixadá23CEHHFF4.0BABCAABAAAAAABAACAAA152
23282020F131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEHHFF4.0BABCAABAAAAAABAACAAA151
61112020M131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEHHFF4.0BABCAABAAAAAABAACAAA151
19162020F131221.002304400.0Fortaleza23.0CE2.01.01.02304400Fortaleza23CEBBBB3.0BABBAABAAAAAABAABAAA148
53232020M131221.002302206.0Beberibe23.0CE2.01.01.02302206Beberibe23CEHHFF3.0BABBAABAAAAAABAABAAA148